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ABSTRACT 

As galaxy surveys become larger and more complex, keeping track of the completeness, mag- 
nitude limit, and other survey parameters as a function of direction on the sky becomes an 
increasingly challenging computational task. For example, typical angular masks of the Sloan 
Digital Sky Survey contain about N = 300,000 distinct spherical polygons. Managing masks 
with such large numbers of polygons becomes intractably slow, particularly for tasks that run 
in time O (-/V^) with a naive algorithm, such as finding which polygons overlap each other 
Here we present a "divide-and-conquer" solution to this challenge: we first split the angular 
mask into predefined regions called "pixels," such that each polygon is in only one pixel, and 
then perform further computations, such as checking for overlap, on the polygons within each 
pixel separately. This reduces O (N^) tasks to O (iV), and also reduces the important task 
of determining in which polygon(s) a point on the sky lies from O (N) to O (1), resulting in 
significant computational speedup. Additionally, we present a method to efficiently convert 
any angular mask to and from the popular HEALPix format. This method can be generi- 
cally applied to convert to and from any desired spherical pixelization. We have implemented 
these techniques in a new version of the MANGLE software package, which is freely available 
at http : //space .mit . edu/home/tegmark/mangle/, along with complete docu- 
mentation and example applications. These new methods should prove quite useful to the 
astronomical community, and since MANGLE is a generic tool for managing angular masks 
on a sphere, it has the potential to benefit terrestrial mapmaking applications as well. 

Key words: large-scale structure of Universe - methods: data analysis - surveys 
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1 INTRODUCTION 

Over the past few decades, galaxy surveys have provided a wealth 
of information about the large-scale structure of our Universe, and 
the next generation of surveys currently being planned promises to 
provide even more insight. In order to realize the full potential of 
upcoming surveys, it is essential to avoid unnecessary errors and 
approximations in the way they are analysed. The tremendous vol- 
umes of data produced by these new surveys will shrink statistical 
uncertainty to unprecedented levels, and in order to take advantage 
of this we must ensure that the systematic uncertainties can keep 
pace. The purpose of this paper is to maximize the scientific util- 
ity of next-generation surveys by providing methods for processing 
angular masks as rapidly and accurately as possible. 

Angular masks of a galaxy survey are functions of direc- 
tion on the sky that model the survey completeness, magnitude 
limit, seeing, dust extinction, or other parameters that vary across 
the sky. The earliest galaxy redshift surve ys - the first C enter for 
Astrophysics redshift survey (CfAl: iHuchra et alj|l983h and the 
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first Southern Sky Redshift Survey (SSRSl; lda~Costa et alj[l99lh 
- had simple angular masks defined by boundaries in declina- 
tion and Galactic la titude. The next generation of su rveys - IRAS 
jStraussetalll 19921) and PCSz jSaunders et ai]|2000() - had some- 
what more complex masks, with some regions of high contamina- 
tion excluded from the survey. 

The present generation of su rveys - the Two Degree Field 
Galaxy Redshift Survey (2dFGRS: 'Colless et al.ll200lL l2003h and 
the Sloan Digital Sky Survey (SDSS; York et al. 2000!) - consist of 
photometric surveys that identify galaxies and measure their angu- 
lar positions combined with spectroscopic surveys that measure a 
redshift for each galaxy to determine its distance from us. Angu- 
lar masks are useful for describing parameters for both photomet- 
ric and spectroscopic surveys - for example, seeing and magnitude 
limit are key parameters to model in photometric surveys, and the 
survey completeness - i.e., the fraction of photometrically selected 
target galaxies for which a spectrum has been measured - is vital 
for analysing spectroscopic surveys. 

The angular masks of SDSS and 2dFGRS consist of circu- 
lar fields defined by the spectroscopic plates of the redshift survey 
superimposed on an angular mask of the parent photometric sur- 
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Figure 1. Speed trials for a series of portions of the SDSS DR5 mask with and without pixeUzation. (a) Time required for pixelization, snapping, balkanization 
and unification of the mask, (b) Time required to identify in which polygon each of the ~400,000 SDSS DR5 galaxies lies. Each set of trials is fitted with 
a power law to show how the processing time scales with the number of polygons A'^. Also shown on the x-axis are the number of polygons in the 2dFGRS 
mask, the SDSS DR5 mask, and conservative estimates for the Pan-STARRS and LSST large-scale structure masks based on scahng up 2dFGRS. 



vey. 2dFGRS uses the Automatic Plate Measurement (APM) sur- 
vey (Maddox et al. 1990a,b, 1996) as its parent photometric survey 
and covers approximately 1500 square degrees. The APM angular 
mask consists of 269 5° x 5° photographic plates which are drilled 
with numerous holes to avoid bright stars, satellite trails, plate de- 
fects, and so forth. Combining the photographic plates, holes, and 
spectroscopic fields gives a total of 3525 polygons that define the 
spectroscopic angular mask of 2dFGRS. 

The SDSS covers a larger area on the sk y - 5740 square de- 
grees in Data Release 5 (DR%; Adelman-Mc Carthv et"al]|2007|) 
- and has a yet more complicated angular mask than 2dFGRS. 
The SDSS photometric survey is done by drift-scanning: each scan 
across the sky covers six long, narrow scanlines, and the gaps be- 
tween these lines are filled in with a second scan slightly offset 
from the first, producing a "stripe" about 2.5° wide assembled from 
12 scanlines. In addition to the fairly intricate pattern produced by 
this scanning strategy, there are nearly 250,000 holes masked out 
of the photometric survey for various reasons, plus the circular 3° 
spectroscopic fields. Combining all of these elements produces an 
angular mask for the spectroscopic survey that contains 340,351 
polygons. 

To accurately manage t he 2dFGRS and SDSS angular masks, 
iHamilton & TegmarkI ( l2004h developed a suite of general-purpose 
software called mangle, which performs several important pro- 
ce dures on angular masks using computational methods detailed 
in IHamilto n ( 1993a b). This software has proved to be a valu- 
able resource to the astronomical communit y: it has been used 
in se veral analyses of ga l axy survey data (Tegmark et al. II200I 
20041 : iMandelbaum et al] [2005: Hikage et al. 2005; Parketal. 
2005; Conrov et al. 2005; Park et al. 2007; Nishimichi et al. 2003; 
She n et al. 2007; Wang et al. 2007; Tinker et al. 2007). Addition- 
ally, it was used extensively in the preparation of the New York Uni- 
versity Value-Added Galaxy Catalog (NYU-VAGC: lBlanton et al.l 



l2005l) . which has been used as the basis for almost all publications 
on large-scale structure by the SDSS collaboration. 

However, many of functions in the original version of MAN- 
GLE run in O {N^^ time, which becomes quite computationally 
challenging as the size and complexity of surveys continues to in- 
crease - computations involving the SDSS mask can take several 
months of CPU time. In this paper we present new algorithms 
that can process complicated angular masks such as SDSS dra- 
matically faster with no loss of accuracy. Our method is based 
on splitting an angular mask into pixels, reducing the processing 
time to O (N) by adding an O {N log A*') preprocessing step. Sim- 
ilar methods based on hierarchical spatial subdivisions have been 
found to be useful in the field of computational geometry (see, e.g., 
iGoodman & O'Rourkel200i ). but have not previously been applied 
to angular masks in an astronomy context. 

This will be especially useful for next-generation 
surv eys, such as the Dar k Energy Survey (DES; 
.Dark Energy Survey Collaboration! 12005) and surveys done 
with the Wide-Field Multi-Object Spectrog raph (WFMOS; 
lYamamoto et al.l I2OO6I : iGlazebrook et al.l I2OO6I) , the Panoramic 
Survey Tele scope and Rapid Response System (Pan-STARRS; 
'Kaiser 2004), and the Large Synoptic Survey Telescope (LSST; 
Jyson 2002; Stubbs et al. 2004; Tyson 2006; LSST Collaboration 
Fooel) . DES, Pan-STARRS, and LSST will perform photometric 
surveys and use techniques for estimating redshifts based on the 
photometric information; WFMOS will perform spectroscopic 
surveys using one of the upcoming photometric surveys for 
target selection. The methods we present here are useful for 
both photometric and spectroscopic surveys - keeping track of 
factors such as seeing and dust extinction could pro ve particularly 
importMt for photometr i c redshift determin ations jCollister et al.l 
l2007l : IOvaizu et al.ll2008l ; lBanerii et alj2008h . 

The proposed large-scale structure survey to be produced by 
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Figure 2. A portion of the SDSS DR5 angular mask ifilanton et al.ll200^ ; lAdelman-McCarthv et al]|2007l) . Left: Polygons defining the mask: spectroscopic 
plates and lines delineating different scans and spectroscopic plates are shown in black, and holes in the mask are shown in blue/gray. Right: Processed version 
of the mask, shaded according to survey completeness. 



Table 1. Definitions of Terms, in Alphabetical Order 



Term Definition 

boundary A set of edges bounding a polygon. 

cap A spherical disk, a region above a circle on the unit sphere. 

circle A line of constant latitude with respect to some arbitrary polar axis on the unit sphere, 

edge An edge is part of a circle. A polygon is enclosed by its edges. 

great circle A line of zero latitude with respect to some arbitrary polar axis on the unit sphere. A great circle is a circle, but a circle is not necessarily a 
great circle. 

mask The union of an arbitrary number of weighted polygons. 

pixel A special polygon that specifies some predefined region on the sky as part of a scheme for discretizing the unit sphere. Once a mask has 

been pixelized, each polygon in the mask is guaranteed to overlap with exactly one pixel, 
polygon The intersection of an arbitrary number of caps. 

rectangle A special kind of polygon, a rectangular polygon bounded by lines of constant longitude and latitude, 
vertex A point of intersection of two circles. A vertex of a polygon is a point where two of its edges meet. 

weight The weight assigned to a polygon. The spherical harmonics of a mask are the sum of the spherical harmonics of its polygons, each weighted 

according to its weight. A weight of 1 is the usual weight. A weight of signifies an empty polygon, a hole. In general the weight may be 
some arbitrary positive or negative real number 



Pan-STARRS will cover ~30,000 square degrees in 5 wavelength 
bands, with each field being observed ~50 times such that the im- 
ages can be co-added. A naive scaling up of the 2dFGRS area and 
number of polygons gives an estimate of ~2 x 10^ polygons for the 
final Pan-STARRS mask. Similarly, the LSST large-scale structure 
survey will cover ~20,000 square degrees in 6 bands, with ~200 
co-added images, suggesting ~8 x 10^ polygons for the LSST 
mask. The need for an improvement in mask processing speed is 
clearly illustrated in Fig.[T] with the old algorithms, the projected 
processing time for the LSST mask would be over 6000 years. With 
our new method, this time is reduced by a factor of ~24,000 to just 
ten days. 

In addition to developing faster algorithms for processing an- 
gular masks, we have also integrated the MANGLE utilities with 
HEALPix, a widely used tool for discretizing the celestial sphere 



jGorski et alj|2005h . The methods used by MANGLE are comple- 
mentary to HEALPix: mangle is best used for functions that are 
piecewise-constant in distinct regions of the sky, such as the com- 
pleteness of a galaxy survey. In contrast, HEALPix is optimal for 
describing functions that are continuously varying across the sky, 
such as the cosmic microwave background (CMB) or the amount 
of extinction due to Galactic dust. The ability to convert rapidly 
between these two formats allows for easy comparison of these 
two types of data without the unnecessary approximation inherent 
in discretizing an angular mask. Furthermore, converting a mask 
into HEALPix format allows users to take advantage of pre-existing 
HEALPix tools for rapidly computing spherical harmonics. 

The spectacular surveys on the horizon are preparing to gen- 
erate massive, powerful datasets that will be made publically avail- 
able - this in turn necessitates powerful and intuitive general- 
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Figure 3. A cartoon illustrating the process of balkanization (a) with no pixelization, (b) with pixelization. 



purpose tools that assist the community to do science with this 
avalanche of data. We provide a such tools with this new gener- 
ation of the MANGLE software and describe these new tools here. 
However, this paper is not a software manual (a manual is provided 
on the MANGLE website) but rather a description of the underly- 
ing algorithms. These tools have been utilized in recent analyses of 
SDSS data (Tegmark et al. 2006; Swanson et al. 2008); we are now 
making them public so others can use them as well. 

The outline of this paper is as follows: in ^ we give an 
overview of the terminology we use to describe angular masks and 
the basic tasks we wish to perform, and in Sj3]we detail our al- 
gorithms for accelerating these tasks and quantify their speed. We 
describe our methods for integrating MANGLE with HEALPix in ^ 
and summarize in SO 



2 MANGLE TERMINOLOGY 

The process of defining an angular mask of a galaxy survey in a 
generic way requi res a set of sta ndardized terminology. We use the 
terminology from lHamilton & Tegmark (2004) and present a sum- 
mary of it here. Our formal definition of an angular mask is a union 
of an arbitrary number of weighted angular regions bounded by ar- 
bitrary numbers of edges. The restrictions on the mask are 

(i) that each edge must be part of some circle on the sphere (but 
not necessarily a great circle), and 

(ii) that the weight within each subregion of the mask must be 
constant. 

This definition does not cover every theoretical possibility of how 
a piecewise-constant function on a sphere could be defined, but in 
practice it is sufficiently broad to accommodate the design of es- 
sentially any galaxy survey. Furthermore, as we discuss in detail in 
^ a curvilinear angular region (such as a HEALPix pixel) can be 
well-approximated by segments of circles at high resolution. As an 
example of a typical angular mask, we show a portion of the SDSS 



angular mask from Data Release 5 (DR5; lAdelman-McCarthv et al.l 
|2007)inFig.|2] 

The fundamental building block of an angular mask is the 
spherical polygon, which is defined as a region bounded by edges 
that are part of a circle on the sphere. An angular mask is thus 
the union of arbitrarily weighted non-overlapping polygons. For 
convenience, we provide a n updated version of a table from 
[Hamilton & TegmarkI ItW^ in Table [T] containing the definitions 
of key terms used in this paper. 

The basic procedure used by the MANGLE software to pro- 
cess an angular mask consi sts of the following st eps, w hich are 
described in greater detail in iHamilton & Tegmark Bw^ : 

(i) Snap 

(ii) Balkanize 

(iii) Weight 

(iv) Unify 

The snapping step identifies edges of polygons that are nearly co- 
incident and snaps them together so the edges line up exactly. This 
is necessary because nearly-coincident edges can cause significant 
numerical issues in later computations. In practice, this situation 
occurs when two polygons in a survey are intended to abut per- 
fectly, but are prevented from doing so by roundoff errors or nu- 
merical imprecision in the mask definition. There are several tun- 
able tolerances that control how close two edges can be before they 
get snapped together - these can be adjusted based on how pre- 
cisely the polygons defining the mask are specified. 

Balkanization is the process of resolving a mask into a set of 
non-overlapping polygons. It checks for overlap between each pair 
of polygons in the mask, and if the polygons overlap, it fragments 
them into non-overlapping pieces. After this is completed, it iden- 
tifies polygons having disconnected geometry and subdivides them 
into connected parts. The basic concept of balkanization is illus- 
trated in the top panel of Fig. [3] The purpose of this procedure is to 
define all of the distinct regions on the sky in which the piecewise- 
constant function we are intending to model might take on a dif- 
ferent value. For example, the 2dFGRS and SDSS spectroscopic 
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surveys generate masks containing many overlapping circles defin- 
ing each spectroscopic field observed. The SDSS spectrographs can 
observe 640 objects in each field, so if there are more than 640 de- 
sired target galaxies in the field, they might not all be observed. For 
example, one field may have spectra for 80 per cent of the targets, 
and a neighbouring field may have 90 per cent, but in the region 
where they overlap all of the targets may have been observed. This 
is how the survey completeness is determined and illustrates why 
balkanization is necessary. 

After the mask is balkanized, weights are assigned to each 
polygon, representing the value of the survey completeness (or any 
other desired parameter) in that region. The way this is done de- 
pends on how this information is provided for a given survey. For 
exarnple, the 2dFGRS mask software by Peder Norberg and Shaun 
Colqj provides a function that takes an angular position on the 
sky and returns the completeness, the magnitude limit, the photo- 
gr aphic plate number , and the value of the parameter ^ (described 
m ICoUess et al.ll200l h at that location. This information can be im- 
ported into MANGLE by producing a list of the midpoints of each of 
the polygons in the mask, applying the 2dFGRS software to calcu- 
late the value of the desired function at each midpoint, and assign- 
ing these weights to the appropriate polygon by using the "weight" 
routine in MANGLE. For the SDSS mask, the files provided by the 
NYU-VAGCQ ( Blanton et al. 2005) already include weights for the 
survey completeness, so this step is not needed. 

The final step of the processing is unification, which discards 
polygons with zero weight and combines neighbouring polygons 
that have the same weight. While not strictly necessary, this proce- 
dure clears out unneeded clutter and makes subsequent calculations 
more efficient. 

After an angular mask has been processed in this fashion, it 
can be used for function evaluation: i.e., given a point on the sky, 
determine in which one of the non-overlapping polygons it lies, 
and then get the weight of that polygon to obtain the value of the 
function at the input point. It can also be used for creating a random 
sample of points with the same selection function as the survey, cal- 
culating Data-Random {DR) and Random-Random (^RR) angular 
integrals, and computing the spherical harmonics of the mask. The 
MANGLE software provides utilities for all of these tasks. 



3 SPEEDUP: PIXELIZATION 

The tasks of snapping, balkanization, and unification all require 
comparing pairs of polygons in the mask. The brute-force method 
to accomplish this is simply to compare each polygon with ev- 
ery other polygon, which is what the original version of MAN- 
GLE did. This naive algorithm is O {N^'j, which is easily suffi- 
cient for masks such as the 2dFGRS mask, with an A'^ of a few 
thousand polygons. However, the SDSS mask has about 100 times 
as many polygons, and points to the need for a cleverer approach. 
The method we present here is a divide-and-conquer approach we 
dub "pixelization," which processes the mask so that each polygon 
needs to be compared with only a few nearby polygons. 

3.1 Pixelization concept 

The underlying concept of pixelization is as follows: before per- 
forming any snapping, balkanization, or unification, divide the 

^ http : //magnum. anu ■ edu ■ au/~TDFgg/Public/Release/Masks| 
^ http: //sdss .physics ■ nyu ■ edu/vagc| 



mask into predefined regions called "pixels" and split each poly- 
gon along the pixel boundaries such that each polygon is only in 
one pixel. Then for following tasks, polygons need only be com- 
pared with other polygons in the same pixel. 

The process of balkanization with and without pixelization is 
illustrated in Fig. [3] for three overlapping polygons. The top panel 
shows the unpixelized version: balkanization checks for overlap be- 
tween each pair of polygons, and then fragments it into seven non- 
overlapping polygons. 

The bottom panel shows the same process but using pixeliza- 
tion as a first step. First, each polygon is divided along the pixel 
boundaries, shown by the dotted lines. At this point, each of the 
four pixels shown has three polygons in it: the intersections be- 
tween that pixel and each of the original three polygons. Then 
balkanization is performed within each pixel: the three polygons 
in the upper left pixel are split into five non-overlapping polygons, 
and so forth, yielding a final set of 18 non-overlapping polygons. 
In this illustrative example, pixelization increases the complexity 
of the process, but in general it replaces the O (N^) algorithm 
for balkanization with one that is roug hly O (M {N/Mf) for iV 
polygons and M pixels, which is roughly O (N) if Al ~ A^. For 
large, complicated masks such as SDSS, this speeds up the process- 
ing time by a factor of ~1200. 

Once a mask has been pixelized, the important task of deter- 
mining in which polygon(s) a given point lies is sped up greatly as 
well: one merely has to calculate in which pixel the point lies, and 
then test if the point is in each polygon within that pixel. For a typ- 
ical pixelization scheme, the appropriate pixel number for a given 
point can be found with a simple formula, i.e. an O (1) calculation, 
so pixelization reduces the O (N) algorithm of testing every poly- 
gon in the mask to O (N/M). This means that with pixelization, 
this task does not depend on the total number of polygons in the 
mask at all if M ~ A^. 

It is important to note that the pixelization procedure makes no 
approximations to the original mask - the pixels are used simply as 
a tool to determine which polygons are close to each other, not as a 
means of discretizing the mask itself. A mask that has been balka- 
nized after pixelization contains all of the same information as it 
would without pixelization, except that it took a tiny fraction of the 
time to produce. Furthermore, unification can be applied across the 
whole mask rather than within each pixel, which effectively unpix- 
elizes the mask if desired. Thus there are essentially no drawbacks 
to using our pixelization procedure. 

3.2 Pixelization schemes 

3.2.1 Simple scheme 

The most straightforward means of pixelizing the sky is to use lines 
of equal azimuth and elevation as the pixel boundaries. The az- 
imuth and elevation typically correspond to celestial coordinates - 
right ascension (RA) and declination (Dec) - in a survey mask. In 
this scheme, the whole sky is split into quadrants along the equator 
and prime meridian to form the lowest resolution of pixelization. In 
MANGLE this is defined as resolution 1. 

To pixelize to higher resolutions, each pixel is split into four 
child pixels, with the boundaries at the midpoints of azimuth and 
of cos(elevation) within the pixel. This creates pixels with equal 
area. Thus resolution 2 consists of each of the four resolution 1 
quadrants split into four pixels each, and so forth. This procedure 
produces a hierarchical pi xel structure know n in computer science 
terminology as a quadtree dde Berg et ailioOQ) : there are 4*^ pixels 
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Figure 4. The full sky (shown in a Hammer-Aitoff projection in celestial coordinates) pixelized with the simple pixelization scheme at the four lowest 
resolutions. 



in this scheme at resolution r, and each of these pixels has four 
child pixels at resolution r + 1 and r parent pixels, one at each 
lower resolution. Resolution is defined as being the whole sky. 

In MANGLE, the pixels of the simple scheme are numbered as 
follows: the whole sky is pixel 0, the four quadrants of resolution 
1 are pixels 1, 2, 3, and 4, resolution 2 is pixels 5-20, and so forth. 
Thus just one number specifies both the resolution and the pixel lo- 
cation. At each resolution, the pixels are numbered in a ring pattern, 
starting from an elevation of 90° and along increasing azimuth for 
each ring of equal elevation. The simple scheme pixels at the four 
lowest resolutions are shown in Fig.|4l 



3.2.2 SDSSPix scheme 

Alternatively, the pixelization can be done such that it is more 
closely aligned with the mask of a given survey. In particular, a 
pixelization scheme called SDSSPi)Q has been developed for use 
with the SDSS geometry. Like the simple scheme, SDSSPix is a 
hierarchical, equal-area pixelization scheme, and it is based on the 
SDSS survey coordinates A and rj. As described in Stoughton et al, 
( l2002h . SDSS survey coordinates form a spherical coordinate sys- 
tem rotated relative to the celestial coordinate system. The poles 
are located at RA = 95°, Dec = 0° and RA = 275°, Dec = 0° 
(J2000), which are strategically located outside the SDSS covered 
area and in the Galactic plane, r) is the azimuthal angle, with lines 



^ |http: //lahmu.phyast .pitt ■ edu/~scranton/SDSSPix/| 



of constant r; being great circles perpendicular to the survey equa- 
tor, and A is the elevation angle, with lines of constant A being 
small circles parallel to survey equator. A = 0° , = 0° is located 
at RA = 185°, Dec = 32.5° with r) increasing northward. This 
configuration has been chosen such that the stripes produced by the 
SDSS scanning pattern lie along lines of constant 77. 

The SDSSPix base resolution is defined by 36 divisions in the 
r; direction (equally spaced in rf) and 13 in the A direction (equally 
spaced in cos A), for a total of 468 equal-area pixels. These di- 
visions are chosen such that at a special resolution level (called 
the "superpixel" resolution), there is exactly one pixel across each 
SDSS stripe. Finally, as in the simple scheme, higher resolutions 
are achieved by hierarchically subdividing each pixel at a given 
resolution into four smaller pixels. 

SDSSPix has been included in MANGLE by incorporating sev- 
eral routines from the SDSSPix software package available on- 
line.'^ The numbering scheme in the MANGLE implementation of 
SDSSPix differs somewhat from the internal SDSSPix numbering; 
in MANGLE, the entire sky is pixel 0, as in the simple scheme, but 
it has 117 child pixels (instead of 4), numbered from 1 to 117. 
These pixels, which comprise resolution 1 , are not part of the offi- 
cial SDSSPix scheme, but are created by combining sets of 4 pix- 
els from what is defined as resolution 2 in MANGLE. The resolu- 
tion 2 pixels are the official SDSSPix base resolution pixels, and 
are numbered from 118 to 585. Higher resolutions are constructed 
through the standard SDSSPix hierarchical division of each pixel 
into 4 child pixels. As in the simple scheme, the pixel number iden- 
tifies both the resolution and the pixel position. The superpixel res- 
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Figure 5. The full sky (shown in a Hammer-Aitoff projection in celestial coordinates) pixelized with the SDSSPix pixelization scheme at the four lowest 
resolutions. 



olution described above is defined as resolution 5 in MANGLE and 
contains a total of 7488 pixels. 

Mangle typically uses RA and Dec as its internal azimuth 
and elevation coordinates, so the SDSSPix pixels are constructed 
as rectangles in 77 — A coordinates and then rotated into celestial 
coordinates. The SDSSPix pixels at the four lowest resolutions are 
shown in Fig. [3] 



3.2.3 Other schemes 

The implementation of pixelization in MANGLE is designed to be 
flexible: it is simple for users to add their own scheme as well. The 
pixelization uses only four basic routines: 

(i) get_pixel: Given a pixel number, return a polygon represent- 
ing that pixel. 

(ii) which.pixel: Given a point on the sky and a resolution, re- 
turn the pixel number containing the point. 

(iii) get_child_pixels: Given a pixel number, return the numbers 
of its child pixels. 

(iv) get_parent_pixels: Given a pixel number, return the numbers 
of its parent pixels. 

Adding a new pixelization scheme simply requires creating appro- 
priate versions of these four routines. Note that it also requires that 
the pixels can be represented as polygons - this is not strictly the 
case for the HEALPix pixels, as discussed further in SQ] 



3.3 Pixelization algorithm 

The purpose of pixelization is to speed up the processing of angular 
masks, which means that the pixelization itself must be done with 
a clever, speedy algorithm or nothing will be gained. The naive 
algorithm is to search through all of the polygons in the mask for 
those that overlap that pixel. This is O [NM) for A'' polygons and 
M pixels, and is not sufficient for our purposes. 

Our fast pixelization algorithm is a recursive method that takes 
advantage of the hierarchical nature of the pixelization schemes. 
The method works as follows: 

(i) Start with all the mask polygons that are in pixel i. 

(ii) Create polygons for each child pixel of pixel i at the next 
resolution level. 

(iii) Split the mask polygons in pixel i along the child pixel 
boundaries such that each polygon lies within one child pixel. 

(iv) Repeat steps 1-3 for the mask polygons in each child pixel 
until desired stopping point is reached. 

Starting this with pixel i = 0, i.e., the whole sky, will pixelize the 
entire mask in O {N log Af ) time. Thus the pixelization does not 
add too much overhead time to the overall mask processing. 

There are two different methods for choosing the desired stop- 
ping point of the pixelization. The simplest method is to stop at a 
fixed resolution, such that the entire mask is pixelized with pixels 
of the same size. The example mask from Fig.|2]is shown in the left 
panel of Fig.|6]pixelized to a fixed resolution in the simple scheme. 

Alternatively, the stopping condition can be chosen to be a 
maximum number of polygons allowed in each pixel: if there are 
more than A^'max polygons in a given pixel, continue the recursion 
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Figure 6. Left: portion of SDSS mask from Fig.[2]pixelized to resolution 8 (4* total pixels on the sky) with the simple pixelization scheme. Pixel boundaries 
are shown in red/Ught gray. Right: the same mask pixelized with the simple pixelization scheme using the adaptive resolution method with a maximum of 30 
polygons per pixel. 



and divide those polygons into pixels at the next resolution level. 
This results in an adaptively pixelized mask, where higher resolu- 
tions are automatically used in regions of the mask that are more 
complicated. This method is especially useful for masks with vary- 
ing degrees of complexity in different areas. The example mask 
from Fig.|2]is shown again in the right panel of Fig. [6] pixelized 
adaptively with A'max = 30. 

The implementation of pixelization in MANGLE allows users 
to choose either of these methods and to select values for the fixed 
resolution level or for A^max. 



3.4 Speed trials 

In order to choose optimal default values for the maximum reso- 
lution and A^niax as well as to demonstrate the dramatic improve- 
ments in speed that pixelization provides, we have conducted a se- 
ries of speed trials of the new MANGLE software. 

There are two procedures we are interested in optimizing: 
firstly, basic processing of a mask detailed in !j2]involving pixeliza- 
tion, snapping, balkanization, and unification, and secondly, the use 
of the final mask to identify in which polygon a given point lies. We 
carried out trials of these two procedures using both the simple and 
SDSSPix pixelization schemes described in i|3.2| For each of these 
schemes, we tested both the fixed and adaptive resolution methods 
for stopping the pixelization algorithm described in i i3.3l For the 
fixed resolution method, we measured the time for several different 
values of the maximum resolution, and for the adaptive resolution 
method, we measured the time as a function of A^max • 

The results are shown in Fig.|7] (Note that the overhead time 
for reading and writing files and doing general setup has been sub- 
tracted - the times shown here are just for the primary operations.) 
From the fixed resolution trials, we see that the optimal resolution 
choice for the SDSS mask is the one that has approximately 10''' 
total pixels on the sky for both the simple and SDSSPix schemes. 
This corresponds to resolution 9 for the simple scheme and resolu- 



tion 6 for SDSSPix. When using the adaptive method, the choice 
for the maximum number of polygons allowed in each pixel that 
gives the fastest processing is A'max = 40 for the simple scheme 
and A'max = 46 for SDSSPix. Overall, the fastest choice (by a 
slight margin) for the SDSS mask is using the SDSSPix scheme 
with adaptive resolution. It is also interesting to note that different 
MANGLE processes have different optimum values - for example, 
snapping is fastest when there are fewer polygons in each pixel 
compared to balkanization. 

The impact of our pixelization algorithm is most clearly 
demonstrated by Fig. [T] This shows the processing time and poly- 
gon identification time for a series of selected portions of the SDSS 
mask as a function of the total number of polygons, both with and 
without pixelization - again, overhead time has been subtracted 
here. Pixelization clearly gives an improvement in speed that be- 
comes increasingly significant for larger numbers of polygons. 

To quantify this, we fit theoretical models to each series of tri- 
als. Without pixelization, the processing time for snapping, balka- 
nization, and unification is well-fitted by AN"^ where A is a free 
parameter with a best-fitted value of A = 3.2 x 10^'' CPU sec- 
onds. With pixelization (and using the adaptive resolution method 
with the optimal choice for A'max) this is reduced substantially. 
We fit the times using theoretical models based on the formu- 
las in fj3T| assuming M ~ A'^. We fit the pixelization time with 
BN log4 A'^ and the snapping, balkanization, and unification time 
with CN where B and C are free parameters. The simple scheme 
gives (B, C) = (0.0054, 4.7 x 10"*) and the SDSSPix scheme 
gives (B, C) = (0.0045, 4.8 x 10"'') (units are all in CPU sec- 
onds) - thus for processing the SDSS mask, the SDSSPix scheme 
is slightly faster than the simple scheme. Our fits give C /B — 0.1, 
so the overall processing time scales like O {N + O.IN log^ A''). 

These fitted curves are shown in Fig. [T] and allow us to ex- 
trapolate estimates for processing masks with larger numbers of 
polygons. For the full SDSS DR5 mask containing about 300,000 
polygons, pixelization reduces the processing time by a factor of 
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Figure 7. Time requii'ed to process the full SDSS DR5 mask with different choices of pixehzation schemes and methods. The "total" curves are the sum of the 
pixelization, snapping, balkanization, and unification curves. Also showfn is the time required to identify in which polygon each of the ~400,000 SDSS DR5 
galaxies lies (polyid), scaled up by a factor of 10. The plots on the left show time vs. the number of pixels at a fixed resolution. The plots on the right show 
time vs. Nma-K, the maximum number of polygons in allowed in each pixel when pixelizing with the adaptive resolution method. 



^1200. The improvement for future surveys will be even more dra- 
matic: the mask for the co-added LSST survey might contain ~10* 
polygons - pixelization reduces the processing time by a factor of 
~24,000. 

The time for identifying in which polygon each SDSS galaxy 
lies is significantly sped up as well: without pixelization it is fitted 
by DN where D — 0.012 CPU seconds, although the scatter is 
rather large due to dependence on which polygons are used. With 
pixelization, it is well-fitted by a constant (8.2 CPU seconds for the 
simple scheme, 7.9 CPU seconds for the SDSSPix scheme) - thus 
the time for polygon identification does not depend on how many 
polygons are in the mask if the number of pixels used is chosen to 
be roughly proportional to the number of polygons. For the SDSS 



mask, the time to identify the polygons for all of the SDSS galaxies 
is reduced by a factor of nearly 500. 



4 UNIFICATION WITH HEALPIX AND OTHER 
PIXELIZED TOOLS 

HEALPix (Gorski et al. 1999b, 2005) is a hierarchical, equal- 
area, isolatitudinal pixelization scheme for the sphere motivated 
by comput ational challenge s in analyzing CMB data (Gorski et 
al. 1999a, Bon d et alj[T999 ). Its base resolution consists of 12 
equal-area pixels; to generate higher resolutions, each pixel at a 
given resolution is hierarchically subdivided into four smaller pix- 
els, and it represents an interesting class of spherical projections 
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Figure 8. Left: Portion of SDSS mask as shown in Fig. |2] Right: Portion of SDSS mask from Fig. |2] as approximated by HEALPix pixels, rasterized to 
Af.ide = 512. 



jCalabretta & RoukemalllOOTh . Resolution in HEALPix is defined 
in terms of Nside, the number of divisions along the side of a base- 
resolution pixel required to reach the desired resolution. Because 
of the hierarchical definition of the higher resolution pixels, TV^idc 
is always a power of 2, and the total number of pixels at a given 
resolution is VIN^^^^. 

HEALPix is a very useful scheme in that it allows for fast and 
accurate astrophysical computations by means of appropriately dis- 
cretizing functions on the sphere to high resolution (Wan delt et al.i 
1 19981 Dore et al. 2001b). In particular, HEALPix includes routines 
for fas t computations of spherical harmonics (Pore, Knox, & Pee l 
2001a: lHivon et alj2002l:lwandelt et al.l200ll:ISzaDudi et alj200lh . 
It is widely used in the analysis of cos mic microwave background 
data from WMAP (iSpergel et alj|2007h and has recently been use d 
to approximate galaxy survey masks as well jPercival et alj200^ . 

Combining HEALPix with MANGLE is useful because it fa- 
cilitates comparisons between galaxy survey data and the CMB 
as is done in experiments measuring the integrated Sachs- 



Wolfe effect jPadmanabhan et al.ll2005l : iGiannantonio et alj|2006l: 
Rassat et al. 20071). the Su nyaev-Zel'dovich effect i Myers et al.l 



2004lReid & Sperge l'2006^. etc., and can also be used for generat- 



ing mas ks that block out regio ns of high dust extinction from galaxy 
surveys dSchlegelet alii 19981 dust map available in HEALPix for- 
mat onlin^fl)- 

In general, it can be applied to any task requiring comparison 
between a piecewise-constant function on the sphere to a contin- 
uous function sampled on a discretized spherical grid. Converting 
an angular mask into HEALPix format also allows for rapid compu- 
tations of approximate spherical harmonics of the mask using the 
existing HEALPix tools. 

The implementation of the HEALPix scheme in MANGLE con- 
sists of two components: 



^ |http : / /lambda ■ gsf c ■ nasa ■ gov/product/ f oreground/ebvjna- 



(i) A new polygon format, "healpix.weight", which allows the 
user to input a list of weights corresponding to each HEALPix pixel 
at a given A^'sidc parameter; 

(ii) A new utility, "rasterize", which essentially allows the user 
to pixelize a mask against the HEALPix pixels, by means of a dif- 
ferent technique than the pixelization method described in ij3] 

Together, these new features allow effective two-way conversion 
between the HEALPix specifications and those of MANGLE. 



4.1 Importing HEALPix maps into mangle 

The structure of the healpix_weight format is quite simple: an input 
file consists of a list of numbers corresponding to the weight of each 
HEALPix pixel at a given A^ aido parameter, usin g the nested num- 
bering scheme described in iGorski et alj j2005l) . In addition, the 
definition of the A'^sidc parameter is extended to include 0, which 
corresponds to a single pixel covering the entire sphere. MANGLE 
constructs polygons approximately equivalent to the HEALPix pix- 
els through the following procedure: 

(i) The exact azimuth and elevation of each vertex of a given 
pixel are calculated using the HEALPix utility "pix2vec_nest"; 

(ii) The exact azimuth and elevation of each vertex of the four 
child pixels of the current pixel are calculated using the same utility, 
four of which are the midpoints of the edges of the current pixel; 

(iii) The four vertices of the given pixel are combined with the 
four midpoints to construct the current pixel. Each edge is defined 
by the circle that passes through the t wo vertices and the midpoin t, 
using the "edges" format described in Hamilton & Tegmarkl ( [2004h . 

(iv) To eliminate spurious antipodal pieces of the polygon defin- 
ing the pixel, a fifth cap is constructed whose axis coordinates are 
the exact centre of the current pixel and whose radius is 10"'' radi- 
ans greater than the distance from the centre of the pixel to any of 

gheAa ar vertices — this cap thus encloses the entire pixel. 
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Figure 9. The new version of mangle can output angular masks in HEALPi.x format, which allows for easy comparisons to CMB and other sky map data and 
allows users to take advantage of existing HEALPix tool s. Top: The SPSS DR5 completeness mask, rasterized and plotted using HEALPix routines. Middle: 
The final 2dFGRS coinpleteness mask as determined in [Hamilton et al.l j20O8|), rast erized and plotted using HEALPix routines. Bottom: CMB temperature 
difference map measured by WMAP channel 4, with units in mK. ISpergeletal] ilOOTll All three inaps are shown at Nsida = 512 in celestial coordinates. 
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A similar technique could be applied to incorporate other pixeliza- 
tion schemes not exactly described by spherical polygons as well. 
It is important to note that the pixels constructed in this manner 
are not exactly equivalent to the actual HEALPix pixels - while 
the hierarchical and isolatitudinal properties of the HEALPix pix- 
els are preserved, the equal area property is not. For example, at 
A^'sidc ~ 1, the areas of the approximate pixels differ on average 
from the actual area by about 0.08%. 

However, as the A'sidc parameter increases, this difference 
decreases rapidly: at A'^sido ~ 512, the average difference from 
the actual area is 0.000002%. The boundaries of the both the ac- 
tual HEALPix pixels and our circles approximating them become 
straight lines in the flat-sky approximation, i.e., when the pixel 
size is much less than 1 radian. Thus for the resolutions at which 
HEALPix is typically used, the difference is totally negligible. 

In applications involving integrations over the sphere, such as 
calculating spherical harmonics, the slight area differences between 
the pixels can be corrected for in a straightforward manner since 
the area of each pixel is known (and can be easily extracted using 
mangle's "area" format): simply multiply the value of the func- 
tion in each pixel by the area of that pixel divided by the average 
pixel area, and then the HEALPix spherical harmonics routines will 
be exact. Furthermore, the paranoid user can get higher precision 
by rasterizing to a higher resolution and then using the "ud_grade" 
HEALPix utility to obtain results for lower resolutions. 

4.2 Exporting polygon files as HEALPix maps 

The idea behind the rasterization method used to convert polygon 
files into HEALPix maps is very similar to that of pixelization: to 
split up the polygons that comprise a mask using a given set of pix- 
els, such that afterward each polygon lies in only one pixel. How- 
ever, rasterization is somewhat different, in that afterward the con- 
verse statement also holds: each pixel contains only one polygon, 
namely, itself. In particular, rasterization uses an arbitrary user- 
defined spherical pixelization as the pre-determined scheme against 
which to split up the polygons in a given input mask. In general, 
the user-defined "rasterizer" pixels may be from any pixelization 
of the sphere, but the method was originally developed for use with 
the approximate HEALPix pixels described in the previous section. 
From a practical standpoint, it would be simplest to implement any 
spherical pixelization that can be exactly represented as spherical 
polygons by following the steps in 0.2.31 however, for pixeliza- 
tion schemes that do not possess this property (such as HEALPix), 
rasterization provides an alternate method of implementation. 

The final product of rasterization is a polygon file in which 
each polygon corresponds to one of the rasterizer pixels (either the 
approximate HEALPix pixels or a user-defined set of pixels) and 
its weight is the area-averaged weight of the input angular mask 
within that pixel. This output can easily be converted into a FITS 
file read by the HEALPix software using a simple script provided 
with MANGLE. 

Rasterization consists of the following steps: 

(i) Calculate the area of each rasterizer pixel; 

(ii) Compute the area of the intersection of each input mask 
polygon with each rasterizer (e.g., HEALPix) pixel; 

(iii) Calculate the area-averaged weight within each rasterizer 
pixel. 

Note that, as with snapping, balkanization, and unification, this pro- 
cedure is greatly accelerated by pixelizing both the rasterizer poly- 
gons and the polygons defining the mask to the same resolution 



using one of the pixelization schemes described in i]3.2l e.g. simple 
or SDSSPix. Then step (ii) in the above procedure then involves 
comparing only polygons in the same simple/SDSSPix pixel. Our 
example mask from Fig. |2] (duplicated in the left panel of Fig. [Sj 
is shown in the right panel of Fig. [8] rasterized with HEALPix 
pixels at A'^sidc = 512. The ability to convert between MANGLE 
and HEALPix formats allows for straightforward comparisons be- 
tween different types of functions defined on the sphere, e.g. angu- 
lar masks of galaxy surveys and the CMB, as illustrated in Fig. [9] 



5 SUMMARY 

As technologies for surveying the sky continue to improve, the pro- 
cess of managing the angular masks of galaxy surveys grows ever 
more complicated. The primary purpose of this paper has been to 
present a set of dramatically faster algorithms for completing these 
tasks. 

These algorithms are based on dividing the sky into regions 
called "pixels" and performing key operations only within each 
pixel rather than across the entire sky. The pixelization is based 
on a hierarchical subdivision of the sky - this produces a quadtree 
data structure that keeps track of which polygons are nearby each 
other. The preprocessing step of pixelization is O {N log N) for a 
mask with A'^ polygons and ~A'^ pixels, and it reduces the mask 
processing time from O {N^^ to C (A'^). Furthermore, it reduces 
the time required to locate a point within a polygon from O (N) to 
Oil). 

This method is exact, i.e., it does not make a discrete approx- 
imation to the mask, and it takes only a tiny fraction of the com- 
putation time. It accelerates the processing of the SDSS mask by a 
factor of about 1200 and reduces the time to locate all of the galax- 
ies in the SDSS mask by a factor of nearly 500. It will provide 
even more dramatic gains for future surveys: processing time for 
the LSST large-scale structure mask could be reduced by a factor 
of about 24,000. 

We have also described a method for converting between 
masks described by spherical polygons and sky maps in the 
HEALPix format commonly used by CMB and large-scale struc- 
ture experiments. This provides a convenient way to work with both 
piecewise-constant functions on the sky such as the completeness 
of a galaxy survey and continuously varying sky maps such as the 
CMB temperature. Converting angular masks into HEALPix format 
also allows users to take advantage of existing HEALPix tools for 
rapidly computing spherical harmonics. 

All of the new algorithms and features detailed 
here have been integrated into the MANGLE soft- 
ware suite, wiiich is available for free download at 
Ihttp :/ /space .mi t . edu / home /tegmark /mangle/ 
This updated software package should prove increasingly useful 
in the coming years, especially as next-generation surveys such as 
DBS, WFMOS, Pan-STARRS, and LSST get underway. 
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